Apparatus for continuous compression of large volumes of data

ABSTRACT

A system for efficiently transmitting data from a first site to a remote site over a communication medium. The data includes a storage for storing data in sub-segment boundaries, such that few sub-segments are accommodated in each block. The system further includes a storage for storing data including signature data. Each one of the sub-segments is associated with a signature of considerably smaller size than its respective sub-segment. The system includes a processor configured to perform the following, as many times as required: receiving a block and partitioning it into sub-segments. For each sub-segment in the block the processor calculating a signature. It then-determines whether the calculated signature matches a corresponding signature, if any, stored in the signature storage, and in case of no match (indicating that the sub-segment is new or has been modified), transmitting the sub-segment to the remote site and store the calculated signature in the signature storage.

FIELD OF THE INVENTION

The field of the invention is data compression. Specifically, thisinvention is related to storage communications and compression ofreplication and backup traffic.

BACKGROUND OF THE INVENTION

In many communications systems, there is a need to transfer digital dataover communication medium. In several applications, most of the data istransferred over and over to the remote side with only a small fractionof the data changed. These applications include replication, backup, anddata migration. For example, if a certain disk is replicated overnetwork to a remote site then for most replication techniques even ifonly a single bit is modified, a whole block is transferred over theremote site.

Signatures are a generic name for hash style functions that map arelatively large data object (e.g., 2048 bytes) to a small number ofbits (e.g., 64 bits). These functions have the following property—whenthe large objects changes by a little the value of the map changesconsiderably. Hash functions (e.g., MD5, SHA-1, HMAC) are extensivelyused in many applications as means to store data quickly and efficientlyand for data integrity purposes.

In FIG. 1 a, the situation in current storage sub-systems isdemonstrated. The nodes Host 1 and Host 2 communicate with Disk 3 usinglocal communication 4. Typically, the disk is a storage sub-system(e.g., RAID disk) and the local communication lines are either LocalArea Network (LAN) or Storage Area Network (SAN). When each host writesinformation to the disk it is sent also over the Wide Area Network 5 toa remote backup system (instead of Wide Area Network, Metropolitan AreaNetwork or dedicated communication lines may be used). The problem withthe specified configuration is that for every bit changed a block issent over the network lines. This is not only expensive, but also causesconsiderable delay and slow downs. A second configuration, which iscommon today, is shown in FIG. 1 b. In that configuration the storagesystem itself communicates over Wide Area Network to the remote system.Still, whenever a block is written on the storage sub-system it istransmitted over the network lines.

Glossary:

There follows a glossary of terms. The invention is not bound by thisparticular definitions, which are provided for convenience only.

Segment—A segment is a unit of data that is transferred from the host tothe storage system. This includes disk tracks and file system blocks.For example, a segment may be a block of size 16 KB.

Sub-segment—A part of a segment. The size of a sub-segment may vary insize and may not be of equal size per sub-segment. For example, asegment may be a part of size 1 KB. The size of sub-segment may differfrom segment to segment and depend on content, location in the storagesub-system and so forth.

Signature function—A signature function is a mapping from Sub-segmentsto signatures. A signature is of size of e.g. 64-128 bits while thesub-segment is of size of e.g. hundreds to thousands of Bytes. Thesignature function maps two sub-segments that were slightly changed todifferent signatures. Typical yet not exclusive examples of signaturefunctions are CRC (Cyclic Redundancy Code), hash functions such as MD2,MD4, MD5, SHA, SHA-1, various types of checksum, hash functions that arebased on a block cipher (e.g. the Davies-Meyer hash function),RIPEMD-160, HAVAL.

Signature—a collection of bits that is the result of activating thesignature function on a sub-segment. This collection of bitsdistinguishes with high probability between two sub-segments.

Communication medium—physical and logical devices used to transfer bitsfrom one place to another. For instance, Internet Protocol (IP) overWide Area Network (WAN), leased lines communications, Fiber Channel andso forth.

Volume—A collection of segments that logically belong to the sameapplication and possibly share common characteristics.

SUMMARY OF THE INVENTION

By one aspect of the invention, when a data segment enters thecompression system it is partitioned to sub-segments. A list ofsignatures per data sub-segment is maintained. Each signature is theresult of activating a signature function (such as hash function) on thevalue of the sub-segment. When a segment is to be transferred over thecommunication lines it is examined whether the segment containssub-segments that were not modified. Calculating the signature for eachsub-segment efficiently performs this examination. If the signature of agiven sub-segment matches the signature of the same segment (that wasalready transferred to a remote site), then there is no need tore-transfer the sub-segment again. Compression is achieved by notsending data that was not changed. The signatures mechanism enablescomparison to a large amount of data without storing all that data inmemory but only its signatures.

The invention provides for a system for efficiently transmitting datafrom a first site to at least one remote site over a communicationmedium, the data includes bloclcs of data; the system comprising:

storage for storing data in sub-segment boundaries, such that at leastone sub-segment is accommodated in each block;

storage for storing data including signature data; each one of saidsub-segments is associated with at least one signature; each signaturehas a signature size considerably smaller than its respectivesub-segment size;

the system includes a processor configured to perform at least thefollowing, as many times as required:

receiving a block and in the case it accommodates more than onesub-segment partitioning it into sub-segments;

for each sub-segment in the block calculating at least one signature;

determining whether calculated signature matches correspondingsignature, if any, stored in the signature storage, and in case of nomatch indicating that the sub-segment is new or has been modified,transmitting the sub-segment or derivative thereof to at least one ofsaid remote sites, and store the calculated signature in the signaturestorage.

The invention further provides for a processor for operating in a systemfor efficiently transmitting data from a first site to at least oneremote site over a communication medium, the data includes blocks ofdata;

the system includes storage for storing data in sub-segment boundaries,such that at least one sub-segment is accommodated in each block; thesystem further included storage for storing data including signaturedata; each one of said sub-segments is associated with at least onesignature; each signature has a signature size considerably smaller thanits respective sub-segment size;

the processor configured to perform at least the following, as manytimes as required:

receiving a block and in the case it accommodates more than onesub-segment partitioning it into sub-segments;

for each sub-segment in the block calculating at least one signature;

determining whether calculated signature is identical to correspondingsignature, if any, stored in the signature storage, and in case of nomatch indicating that the sub-segment is new or has been modified,transmitting the sub-segment or derivative thereof to at least one ofsaid remote sites, and store the calculated signature in the signaturestorage.

Still further, the invention provides for a method for efficientlytransmitting data from a first site to at least one remote site over acommunication medium, the data includes blocks; the method comprising:

receiving a succession blocks and partitioning each to sub-segments, ifrequired;

processing the sub-segments and transmitting to the at least one remotesite only those sub-segments whose associated signature indicates thatthey were changed.

Yet further, the invention provides for a method for processing data togenerate a compressed data for transmission over communication medium,comprising:

processing successions of data portions and identify those portionswhich were changed;

generating a compressed data that includes data portions which werechanged, and transmitting the compressed data over the communicationmedium.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to understand the invention and to see how it may be carriedout in practice, a preferred embodiment will now be described, by way ofnon-limiting example only, with reference to the accompanying drawings,in which:

FIG. 1 a is an example of a currently wide spread architecture;

FIG. 1 b is an example of a known common architecture;

FIG. 2 describes a system architecture in accordance with an embodimentof the invention;

FIG. 3 describes a more detailed system architecture in accordance withan embodiment of the invention;

FIG. 4 illustrates a flow diagram of the operational steps carried outin a system according to one embodiment of the invention;

FIG. 5 illustrates a flow chart of the operational steps of signaturecalculation and retrieval process, in accordance with an embodiment ofthe invention;

FIG. 6 illustrates a system architecture of a so called contextswitching, in accordance with an embodiment of the invention; and

FIGS. 7A-C illustrate three distinct embodiments of different systemarchitectures.

DETAILED DESCRIPTION OF THE INVENTION

Attention is first drawn to FIG. 2 illustrating a system architecture inaccordance with an embodiment of the invention.

In accordance with the system architecture 20 of FIG. 2, every datasegment that is written is transferred from the host (e.g. 21 or 22),through local network 24, both to the storage sub-system 23 and to thecompression engine 25. After having been processed in compression engine25 (in a manner that will be described in more detail below), the datais sent from the compression engine 25 over the Wide Area Network 26 forstorage.

Note that an important difference from prior art solutions, such as theone described with reference to FIG. 1 a, is that instead of sending thedata directly over the Wide Area Network, the data is first processed inthe compression engine and only, if required, the data is transmittedover the Wide Area Network. This allows considerable bandwidthreduction. For example, one may consider the following scenario. Supposethat the segments are blocks and that every block is of size of e.g. 32KB. For exemplary transactional database blocks, a change may happen ine.g. two locations, say a first location where the first few bytes (inthe header section) and a second location inside the block. Note thatthe number of sub-segments that vary in each block (if at all) dependson the particular application.

Reverting now to the example above, by partitioning the block tosub-segments of, say size 1 KB the compression engine 25 determines thatonly the first sub-segment which accommodates the header section shouldbe transmitted over the network 26 and that additionally one or possiblytwo more sub-segments that accommodate the data stored in the secondlocation should be transmitted over the network 26. Note thattransmitting of additional two (rather than one) sub-segments would berequired only if the modified data in the second location are not whollycontained in one sub-segment but rather overflow to another sub-segment.It would thus be appreciated that in the specified scenario it is morelikely that only two sub-segments needs to be transmitted over the Widearea network 26. As may be recalled in accordance with the specifiedprior art solution al the sub-segments are transmitted (i.e. 32) Thisleads to a compression rate of 1:16 (in the case that two sub-segmentsare transmitted) or 1:10 (in the case that two sub-segments aretransmitted) per block.

For a better understanding of the foregoing, attention is directed toFIG. 3 illustrating a more detailed system architecture in accordancewith an embodiment of the invention. Thus, a storage network gateway 31receives data from the hosts (of which two, i.e. 21 and 22 are shown inFIG. 2). The gateway 31 is coupled to module 32 which in turn is coupledto signature database 34, signature calculation 33 and network gateway35. Note that by this embodiment module 32, signature database 34 andsignature calculation 33 form part of the compression engine 25 of FIG.2. Those versed in the art will readily appreciate that the systemarchitecture of the invention is not bound to the specific embodimentsof FIGS. 2 and 3. Thus, by way of a specific embodiment, the signaturestorage does not form part of the compression engine. Other variants areapplicable, all as required and appropriate. Note also that whilst forconvenience description below focuses on compression engine, thoseversed in the art will readily appreciate that this a non limitingexample of a processor that is configured to perform the operations inaccordance with various embodiments of the invention. The invention isnot bound to any particular processor and accordingly a processor in thecontext of the invention may encompass a distinct processor, pluralityof processors, or other variants for performing the processingoperations in accordance with the various embodiments of the invention.

In operation, a segment (referred to interchangeably also as block) thatwas received e.g. from a given host, say 21 (of FIG. 2) is partitionedinto sub-segments in module 32. For every sub-segment a signature iscalculated in signature calculation module 33, using signature functionof the kind discussed above. Next, it is necessary to ascertain if thecalculated signature is identical to its corresponding stored signaturein database 34. To this end, the old (i.e. stored) signature thatcorresponds to this particular sub-segment (if exists) is retrievedefficiently from the database 34, using e.g. caching techniques and/orcontext switching (as will be explained in greater detail below). If theold signature does not exist (signifying that the current sub-segment isnew), then module 32 triggers transmission of the new sub-segmentthrough network gateway 35 to the WAN 26 to the remote site and thecalculated signature is stored in the signature database 34. Obviously,the new sub-segment is stored in storage 23 (see FIG. 2). It should benoted that for any described embodiment, sub-segments that aretransmitted over the network (say 26 in FIG. 1) may be subject to knownper se compression techniques, such as Lempel-Ziv based coding, or othertechniques, all as known per se. Accordingly, whenever reference is madeto transmission of sub-segments it may apply to derivative thereof, suchas the specified non-limiting example of compressed data using e.g.,Lempel-Ziv-based coding, Lempel-Ziv-Welch coding, Huffman coding.

Alternatively, if a corresponding old signature is found in thesignature database 34, this signifies that this sub-segment alreadyexists and what remains to be done is to ascertain whether it has beenmodified (in which case it should be transmitted) or it has not beenmodified in which case there is nothing to be done. To this end, the oldsignature is retrieved and compared (in module 32) to the so calculatedsignature (that corresponds, as recalled, to the sub-segment underconsideration). If the signature values differ, this signifies thatnewly arriving sub-segment has been modified (compared to the currentlystored version thereof), and that accordingly it (i.e. the modifiedsub-segment) should be transmitted through Gateway 36 to the remotesite. The newly calculated signature is stored in the signature database34 and, obviously, the modified sub-segment is stored in storage 23.

Lastly, if the so retrieved signature and the newly calculated signatureare identical, this signifies, with high degree of certainty, that thesub-segment has not been changed and that accordingly there is no needto transmit it to the remote site and, obviously, the need to store itand its corresponding calculated signature is obviated.

Note that in the latter scenario (i.e. identical signatures), there is asmall probability of mistake, i.e. that different sub-segment valueswill nevertheless be mapped to the same signature value. This error isinherent to the signature function, however, for all practical purposesit is negligible. Generally speaking, the chance of a mistake persub-segment is of the order of 1 over 2 to the power the number of bits.For instance, when using a signature 64-bit-long, this error is of theorder of 5E-20, which is negligible.

Note also that in the latter example (i.e. sub-segment of 1 KB andsignature of 64 bits), the memory required for storing all the signatureof, say, a ITB disk is about 8 GB, which can be easily stored onstandard disk systems. The invention is, of course, not bound by anyspecific block size, sub-segment size and signature size. Whilstnormally a block accommodates two or more sub-segments in certainembodiments it may include one. I.e. it constitutes a sub-segment.

The invention is likewise not bound to the specific embodimentsdescribed with reference to FIG. 2 to FIG. 4. For example, hosts of sameor different types may be used, the communication medium is not bound toLAN 24 or WAN 26 or to any specific storage architecture 23 or 34. Othervariants, also in respect of the specific modules depicted in FIG. 3 areapplicable, all as required and appropriate. Note also that remote sitedoes not necessarily bound to distinct remote storage or distinctgeographical sites. Thus, remote site encompasses one or more remotestorage located in one or more remote geographical sites.

A sequence of operation in accordance with an embodiment discussed aboveis also shown in the flow chart of FIG. 4. Thus, every data segment ispartitioned to sub-segments. The signature of each sub-segment iscalculated. For every sub-segment it is checked if its signature appearsin the available signatures list. If it does, then the new signature andthe old signature are compared. If both signatures are equal thennothing is done and the sub-segment is not transferred. Otherwise, ifeither the signature differs or it is not available, then thesub-segment is transferred over the communication medium and thesignature is stored in the signature storage.

As specified above, in accordance with the invention, data (such assub-segments) are transmitted over the WAN (e.g. 26 of FIG. 2) whenevernecessary. The system of the invention may be utilized for variousapplications, such as:

Data Replication: in Data replication there are at least two volumeswhich essentially keep the same data, with one volume possibly lessupdated due to transmission time. There are three common modes forreplication. Synchronous mode (both volumes are exactly the same at alltimes). This mode requires continuous update, i.e. for everymodification in the first volume, the second volume should be updatedaccordingly, at substantially no delay. In a second, a-synchronous mode,both volumes are almost the same, with allowed inconsistencies measuredin time or number of writes, and a third, snapshot mode (referred toalso as point-in-time), in which the two volumes are not the same, butare synchronized to be the same once in a while. Note that in the secondand third modes the remote volume is not updated for a given timeinterval, until the next update occurs. Whilst for convenience, thedescription herein refers to a volume, it is of course not bound to anyspecific structure or content of the storage.

In any of the specified modes, only new sub-segments or sub-segmentswhich were modified are transmitted to the other volume.

Backup: This is essentially a one time operation where all the data ismoved from one place to another. Often, the data is moved repeatedly tothe same location, and accordingly the invention can be used for backuppurposes since the data contained in the two volumes may be similar.Here also, only new sub-segments or sub-segments which were modified aretransmitted to the other volume.

Data Migration: In data migration a volume is copied to a new site wherethe current data is most likely very different. Accordingly, thetechnique of the invention can be used in order to identify repetitionsin sub-segments, and if such repetitions are detected there is no needto transfer again (to the remote site) the entire sub-segment, butrather a derivative thereof in a form of short code. Here also, only newsub-segments or sub-segments which were modified are transmitted to theremote site.

The invention is not bound by the specific implementations in respect ofeach of the above applications and accordingly other replication, backupand data migration may be applicable. Moreover, it may also be utilizedin other applications, all as required and appropriate.

Reverting now to the operation of various embodiments of the invention,as was explained above, it is desired to employ an efficient retrievalof signatures from the signature database 34 in order to avoid undesiredoverhead insofar the system performance is concerned.

As may be recalled, when a calculated signature is compared to a storedsignature (in a manner described above, in detail with reference toFIGS. 2-4), the system performance may be adversely affected due to theneed to access the slow signature storage (such as the 8 Giga Byte disk(disks) that accommodates the signature database) and find the signaturethat corresponds to the so calculated signature. Accordingly, by oneembodiment, in order to improve the system performance, a fast storage(referred to occasionally also as memory), e.g. cache memory, is used inorder to pre-fetch from the slow storage into the fast storage a groupof signatures that comply with a given criterion. By a non-limitingexample, the criterion being to load and store in the fast memorysignatures of frequently sub-segments. Thus, there are high prospects tolocate in the fast memory a signature that corresponds to a calculatedsignature of a frequently used sub-segment rather than access the slowstorage, thereby obviously improving system performance. Such afrequently used sub-segments are regularly found in variousapplications, including bank applications. The more signatures that arefound in the fast storage the less the need to access the slow storageand the better are the system performance. Note, incidentally, that inthis context, pre-fetching (referred to occasionally in other terms inthe description) refer to the operation of loading data from the slowstorage to the fast storage.

For a better understanding of the foregoing, attention is now directedto FIG. 5, illustrating a flow chart of the operational steps ofsignature calculation and retrieval process, in accordance with anembodiment of the invention. Thus, a signature is calculated in respectof a sub-segment under consideration (51 and 52). Ignoring for a momentinquiry 53 and step 54 (which will be discussed in more detail below),it is tested whether the signature resides in either the fast memory orthe slow memory (55) and if in the affirmative it is fetched form thefast memory or the slow disk (56) (which the case may be) and comparedto the so calculated signature (57) and in the case of match, there isnothing to be done and the next sub-segment (or block) is processed(58). Reverting now to inquiry (55), in the case that the signature isfound neither in the fast memory nor in the slow disk, this indicatesthat the sub-segment under consideration is new, and that it (orderivative version thereof) should be transmitted to the remote site(59) and that the calculated signature should be stored in the signaturedatabase.

Turning now to inquiry 57, in the case of mismatch, there is a need totransmit the currently processed sub-segment or derivative thereof (59).

Note, generally, that the term fast memory (storage) does notnecessarily imply on any particular physical storage or associatedmemory management. It merely indicated that fast storage is considerablyfaster than the external slow storage which stores the signaturedatabase. In the same manner, the system is not bound to any specificexternal storage or memory management. Typical, yet not exclusive,example of fast storage being cache memory. Note that by one embodiment,the cache management itself (what to keep in memory and what in disk)may be implemented in several ways, the cache is a writeback cache.Typical yet not exclusive examples of slow storage being local harddisk, external SCSI disk, or even the main system storage disk array.

By another improvement, there is further provided in the fast memory, alist of the signatures of sub-segments that appear often. The list(which is not bound to any specific data structure realization) furtherstores short codes of these segments. For example, a block of zeros isquite common, since zero padding of tail portions in the externalstorage is quite often used. Other non-limiting examples of blocks thatare commonly repeated belong to headers, spreadsheets, formatteddocuments, email attachments etc.

Such sub-segments (and their respective codes) are well familiar also tothe remote side, since, naturally, zero padded blocks are also stored inthe remote side. Thus, the list stores signature of such zero paddedsub-segment and a code. Thus, whenever there is a need to transfer azero padded sub-segment (e.g. in the case that the currently storednon-zero content of a given sub-segment is padded by zeros), there is noneed to send explicitly the sub-segment or even to compress it, butrather, when if it is found that this is a commonly used sub-segment,the code thereof (which, as a rule, is very short compared tosub-segment size or even compressed-sub-segment) is transmitted, thusfurther improving system performance. This is illustrated in additionalsteps 53 and 54 of FIG. 5. The remote site, when receiving the code,accesses a corresponding database and fetch the sub-segment data thatcorresponds to this code. Note that the code may be for example thesignature of the said sub-segment, or an identifier of the sub-segment.

Those versed in the art will readily appreciate that the specifiedembodiment is not bound by zero padded blocks, which were given forillustrative purposes only.

Having described a non limiting example of implementing faster access bypre-fetching banks of signatures from the slower storage to the fasterone, there follows now provided a brief description for explaining howto access the signature database for the purpose of inquiring whether acalculated signature is stored in the signature database or not. Thisapplied to both signatures stored in the faster storage and in theslower storage. The invention is of course not bound by this particularimplementation. Thus, in order to retrieve signatures from the fast orslow storage, the location of the each signature should be efficientlydetermined. By this embodiment, the location of the signatures is codedas an Interval Tree (which is generally known per se). In this binarytree leaves represent a continuous region in the memory or disk whichcontains the signature of a continuous interval of sub-segments. The nonleaf nodes are of the form “sub-segments on the left side has indexbigger than some value”. In order to locate a given signature of asubsegment, all that is needed is to traverse the interval tree, if theleaf contains the address of the signature, then the location is foundand the signature can be fetched, and if not then the signature iscurrently not stored in the system. For efficiency, the interval tree iskept as balanced tree. Also, if possible, each leaf represents a longinterval (the size of each interval is of a track or more, which by oneembodiment acounts for 32 subsegments or more.)

Turning now to another embodiment, the system's performance can beimproved by employing a so called context switching. Before turning todescribe is this improvement, there follows a short backgrounddiscussion. Thus, as may be recalled, in replication which is notsynchronous (e.g. a-synchronous mode or snap-shot modes) it is possibleto delay the treatment of blocks for a given time interval. In otherwords it is allowed to maintain certain inconsistency between the firstvolume and a second remote volume. (Note that the description belowrefers to volumes for convenience only, and this is by no meansbinding.)

Bearing this in mind, it may be also noted that many storage sitesemploy a multi context. Consider, for example, a bank application wherethere may be many contexts such as email server (first context)financial transaction database (second context), etc. Note that in manystorage systems, there is a clear distinction between applications inthe sense that different applications use different volumes orpartitions in the slow storage. In other words, the email server dataresides in distinct volume(s) of the storage and the transactiondatabase data reside in other volume(s) of the slow storage.

Moving on with the bank system example, in such application, the bankmay allow a limited inconsistency, of, say 30 minutes for the financialtransaction context and 1 hour for the email server context (allowingthus the use of the less costly non-synchronous replication, rather thanthe more costly synchronous one). This means, that in the case of systemmalfunction and loss of data in a main bank site (where the firstvolumes reside), the data may be recovered (on the basis of the storeddata in the remote second volumes) to the extent that it reflects anupdate up to the last 30 minutes (or less) insofar as financialtransactions are concerned, and up to the last 1 hour (or less) insofaras email server is concerned.

Note also that, naturally, incoming data that arrive from the variousapplications (e.g. blocks of data originating of the email server andtransaction database) do not, as a rule, comply with some well organizedsequence. Thus, it may well be the case that from arbitrarily incoming 5blocks, the first “belongs” to the email context, the second and third“belong” to the transaction database, the fourth “belongs” to the emailcontext and the fifth “belongs” to the transaction database.

As has also been mentioned above in connection with the non limitingembodiment described, with reference to FIG. 5, in order to expediteperformance, the fast memory (e.g. cache) is used to store data (i.e.stored signatures) pre-fetched from the signature (slow) storage,thereby facilitating faster comparison between the so calculatedsignature (of the sub-segment under consideration) and the storedsignature (in the case that the latter is stored in the fast mainmemory) compared to the case where signature data is retrieved from theslow signature storage for the purpose of comparison. Obviously,considering that the fast memory and in particular the cache cannotaccommodate the entire signature database (of, say 8 GB), a policy isemployed to decide which signatures to pre-fetch, all as was explainedwith respect to the non-limiting embodiment described with reference toFIG. 5.

Bearing all this in mind, a naive implementation, may require processingthe incoming blocks as they come. Since, however, and as specifiedabove, there is no preliminary knowledge to what context each incomingblock belongs, the fast memory to which signature data is loaded (usingthe policy discussed in FIG. 5, or other one) should accommodatesignatures from two and possibly more contexts. In certain embodimentsthis can be relatively easy to implement, since as specified abovesignature data for each context reside in distinct area [volume] in theslow signature storage. Thus, by this specific example, a first part ofthe fast memory is allocated to store signature data retrieved from theemail context area of the slow signature storage and a second part ofthe fast memory is allocated to store signature data retrieved from thefinancial transaction context area of the slow signature storage.Obviously the more contexts there are, the less area is allocated foreach context in the fast memory.

Now, reverting to the naive implementation, and assuming the 5 blocksdiscussed above (first belonging to email, second and third transactiondatabase, fourth email and fifth transaction database) they areprocessed one at a time. Thus at the onset, the first block (relating toemail data) is processed in the manner specified, i.e. in accordancewith one embodiment it includes, dividing the block to sub-segments andin respect of each sub-segment calculating signature, ascertaining ifthe corresponding signature data resides in the main memory, if yesapplying the comparison and determining whether or not to transmit thesub-segment to the remote site, depending on the signature comparisonresult. If, however, the sought signature is not stored in the mainmemory, but rather it is stored in the signature database in the slowmemory, the signature should be retrieved, and the comparison applied.Having completed the processing of the first block the same procedure isapplied to the second block (belonging to the transaction database).Note here that for the second block the other part of the memory isused, i.e., the one that stores transaction signature data. Theprocedure is repeated for each block in the manner specified. Thoseversed in the art will readily appreciate that the naive approachsuffers from various limitations. For one, for each block, only part ofthe (fast) memory is used. Thus for the first block (email context) onlythe memory part that stores email signature data is used. Obviously, theprospects of finding the sought signature in the fast memory part thatstore email signature data are smaller compared to a situation wherelarger part of the fast memory could be exploited, necessarily entailingmore accesses to the slow signature database, and thereby adverselyaffecting the overall system performance. In addition due to the switchbetween the contexts (e.g. in the latter example switching betweenemail/transaction contexts, depending on the context of the incomingblock), there is additional overhead when accessing the slow signaturedatabase, since, as specified above, each context may be stored indifferent area of the storage and moving frequently between one area tothe other of the storage renders the slow disk access even slower,thereby further adversely affecting the system performance. Note that inreal-life scenarios, there are as a rule more contexts and accordinglythe system performance is further degraded.

It is noteworthy, that the more contexts there are, the smaller is thepart in the main memory that can be allocated for each context thusfurther reducing the chance of finding the sought signature in the mainmemory and posing undue overhead in accessing the slow signaturestorage.

Bearing all this in mind, a context switching application in accordancewith one embodiment of the invention (with reference to FIG. 6) will nowbe described. The context switching application is particularly usefulfor non-synchronous update (e.g. the specified non-synchronousreplication application), where it is permitted to maintain certaininconsistency between the local and remote volumes of data. By thisembodiment, a context splitter 61 splits the incoming blocks accordingto their contexts to distinct context buffers. In the example of FIG. 6,there are shown three distinct buffers 62 to 64. The invention is notbound to any specific manner of splitting the contexts, and by onesimplified embodiment, the incoming blocks are identified according totheir source (e.g. email, transaction database, etc.) and stored intheir respective buffer. Now, assuming that blocks that belong to thefirst context 62 are processed (in accordance with the selection ofcontext selector module 65), the incoming blocks of this context areretrieved in, say FIFO fashion from the buffer 62 and are processed oneat a time.

Note that incoming blocks that belong to the currently non-selectedcontexts are stored in their respective buffers 63 and 64 and will beprocessed later. This necessarily entails that there will be a delay inprocessing them (i.e. the blocks stored in buffers 63 and 64) andidentifying whether or not there is a change in these blocks thatrequires to transmit update to the remote side. However, as may berecalled, in non-synchronous applications (such as the specifiednon-synchronous replication), a delayed update is permitted (accordingto the maximal permitted delay prescribed by the replication policy) andwhat is required is to assure that the delay time of processing theseblocks will not exceed the maximal permitted delay and that blocks areretrieved and processed before buffer overflow is encountered. Theseconstraints can be adequately handled by the context selection modulewhich will switch context before the specified violations occur. Notethat the context selection module is not bound by the specified decisionpolicies, and accordingly others may be employed, depending upon theparticular application.

Reverting now to FIG. 6, and as further shown, the slow signaturestorage is split to distinct areas 67 to 69 according to the respectivecontexts. Note that for convenience they are shown as distinct modules,but in reality the distinct areas may be separate parts of the samestorage.

Now, when a given context buffer is selected, (say 62) the appropriatesignature database is accessed (say 67 storing signature data forcontext 1) and signatures are pre-fetched therefrom and stored in alarge portion of the (fast) memory space that is allocated for signaturedata.

It is important to note that whereas in the specified naive approachonly part of the fast memory was utilized for a given context (leavingthe remaining parts to other contexts), in accordance with a nonlimiting context switching embodiment described herein, the parts of thefast memory areas that before were allocated to other contexts (in thenaive implementation) can be utilized to store data of the currentlyprocessed context, since blocks from the same context will becontinuously processed (i.e. one block after the other, all extractedfrom the same context buffer) until the processing will be switched toanother context, under the control of the context selector 65. Note thatdue to the fact that larger (fast) memory space is used for thisparticular context (compared to say the naive approach) the prospects oflocating the sought signature in the fast memory are considerablyincreased, reducing thus the rate of access to the slow signaturedatabase, and thereby considerably improving the system's performance.Note also that throughout the processing of the same context, wheneverthere is a need to access the slow database (if the sought signature isnot found in the fast memory) it is always performed to the same area(e.g. 67) obviating the additional overhead of switching between thedifferent storage areas, as is the case in the specified naive approach,which as may be recalled necessitates switching to different areas ofthe storage depending on the context of the currently processed block.

Reverting now to the switch context processing, by this embodiment theprocessing of each block (as extracted from the context buffer), may be,e.g. in the manner similar to that discussed with reference to FIG. 2-4above, and the decision which signatures to load and store in the main(fast) memory may be e.g. in accordance with the policy described inFIG. 5. When the context selector switches to a different context buffer(say, 63) the procedure is repeated in respect of the blocks that belongto this context, and so forth. Obviously, whilst processing the blocksof the newly selected context, the incoming blocks that belong to thepreviously processed context are accumulated in their context bufferuntil the latter is re-selected by the context selector.

Those versed in the art will readily appreciate that the presentinvention is not limited to a separate device. The compression enginemay be software/hardware based and reside on each of the nodes that usethe storage sub-system. In such an architecture the network gateway isalso part of the host.

There follows now a brief overview of three non-limiting systemarchitectures. In the first architecture shown in FIG. 7A, thecompression engine 71 and its resources (memory, disk, networkconnection, and CPU) for performing the signatures and storing themreside in the host computer 72. In this architecture the compressionengine runs as a software ingredient.

In a second architecture (illustrated in FIG. 7B) some work is performedin the host computer and some work is performed in a separate computerwhen the compression engine runs. Specifically, it is fairly natural toperform the signature calculation (73) in the host and then to send thecalculated signature and its associated sub-segment to the compressionengine (74), as well as the sub-segment itself to the disk for storage.The signature matching and signature database management are performedin the compression engine.

In accordance with a third embodiment (see FIG. 7C) the host transfersthe sub-segment both to the Disk for storage and to a separatecompression engine (75) computer which performs all the operationsincluding signature calculation, signature, retrieval, comparison andsignature database management operations (including caching and/orcontext switching, if applicable). If desired two or more of thespecified modes may be operated in the same system, which may switchbetween the respective modes, depending on decision criterion, such asload balancing.

Note that the invention is by no means bound by this specificembodiments, described with reference to FIGS. 7A-C, and accordinglyother variants are applicable, all as required and appropriate.

By another embodiment, in the case of that certain rules are violated,say the space required to allocate the signatures exceeds the availablestorage space or, say, certain corruption in the signature database isencountered, the compression engine operation may be temporarilycircumvented giving rise to a mode of operation where incomingsub-segments are transmitted as is (or in compressed form) to the remotesite, thereby not causing any damage due to loss of data. Once themalfunction is overcome, the operation of the compression engine isresumed and continued in the manner specified above. The net effect isthat even in system malfunction or other pre-defined operationalscenarios, no loss of data occurs, and this at the cost of temporalsystem degraded performance. It will also be understood that the systemaccording to certain embodiments of the invention may be a suitablyprogrammed computer. Likewise, the invention contemplates a computerprogram being readable by a computer for executing the method of theinvention. The invention further contemplates a machine-readable memorytangibly embodying a program of instructions executable by the machinefor executing the method of the invention.

Note that regardless of the embodiment under consideration, the remotesite receives the transmitted sub-segment (with an associated address)and stores it in the database (say replicated copy in the case of areplication application), all as known per se. In those cases where acompressed or coded sub-segment is received at the remote site, it firstderives the sub-segment and stores it, again as known per se.

The present invention has been described with a certain degree ofparticularity, but those versed in the art will readily appreciate thatvarious alterations and modifications can be carried out withoutdeparting from the scope of the following Claims:

1. A system for efficiently transmitting data from a first site to atleast one remote site over a communication medium, the data includesblocks of data; the system comprising: storage for storing data insub-segment boundaries, such that at least one sub-segment isaccommodated in each block; said storage includes at least two storagebuffers such that blocks originating from the same origin are stored inthe same storage buffers; signature storage for storing data includingsignature data; each one of said sub-segments is associated with atleast one signature; each signature has a signature size considerablysmaller than its respective sub-segment size; the system includes aprocessor configured to perform at least the following, as many times asrequired: receiving a block and in the case it accommodates more thanone sub-segment partitioning it into sub-segments; for each sub-segmentin the block calculating at least one signature; determining whethercalculated signature matches corresponding signature, if any, stored inthe signature storage, and in case of no match indicating that thesub-segment is new or has been modified, transmitting the sub-segment orderivative thereof to at least one of said remote sites, and store thecalculated signature in the signature storage, wherein said processor isfurther configured to selectively switch between said at least twostorage buffers according to switching criterion, and wherein for eachselected storage buffer, said signature processing is performed inrespect of blocks that are stored in said storage buffer.
 2. The systemaccording to claim 1, wherein said signature storage includes slowstorage and fast storage.
 3. The system according to claim 2, whereinsaid fast storage includes cache memory.
 4. The system according toclaim 3, wherein said processor is configured to perform signatureprocessing including: pre-fetch signatures from the slow storage to thefast storage according to a given criterion, and wherein said processoris configured to determine whether calculated signature matchescorresponding signature, if any, stored in the fast signature storage orthe slow signature storage.
 5. The system according to claim 4, whereinsaid criterion being to pre-fetch signatures of frequently usedsub-segments.
 6. The system according to claim 1, wherein sub-segmentsfor transmission are compressed to thereby constitute said derivativesof said sub-segments.
 7. The system according to claim 2, wherein saidfast storage further storing a list of commonly used sub-segments and anassociated codes being each considerably shorter than the respectivesub-segment, and in the case that a sub-segment that is to betransmitted belongs to said commonly used sub-segments, transmitting thecode which constitutes said derivative of the sub-segment.
 8. The systemaccording to claim 1 used for data replication.
 9. The system accordingto claim 1 used for backup.
 10. The system according to claim 1 used fordata migration.
 11. A processor for operating in a system forefficiently transmitting data from a first site to at least one remotesite over a communication medium, the data includes blocks of data; thesystem includes storage for storing data in sub-segment boundaries, suchthat at least one sub-segment is accommodated in each block; saidstorage includes at least two storage buffers such that blocksoriginating from the same origin are stored in the same storage buffers;the system further includes signature storage for storing data includingsignature data; each one of said sub-segments is associated with atleast one signature; each signature has a signature size considerablysmaller than its respective sub-segment size; the processor configuredto perform at least the following, as many times as required: receivinga block and in the case it accommodates more than one sub-segmentpartitioning it into sub-segments; for each sub-segment in the blockcalculating at least one signature; determining whether calculatedsignature is identical to corresponding signature, if any, stored in thesignature storage, and in case of no match indicating that thesub-segment is new or has been modified, transmitting the sub-segment orderivative thereof to at least one of said remote sites, and store thecalculated signature in the signature storage, wherein said processor isfurther configured to selectively switch between said storage includesat least two storage buffers according to switching criterion, andwherein for each selected storage buffer said signature processing isperformed in respect of blocks that are stored at the selected storagebuffer.
 12. A method for efficiently transmitting data from a first siteto at least one remote site over a communication medium, the dataincludes blocks of data stored at a storage that includes at least twostorage buffers such that blocks originating from the same origin arestored in the same storage buffers; the method comprising: receiving asuccession of blocks and partitioning each to sub-segments, if required;processing the sub-segments and for each sub-segment calculating atleast one signature; determining whether calculated signature isidentical to a corresponding signature, if any, stored in a signaturestorage, and in case of no match indicating that the sub-segment is newor has been modified, transmitting the sub-segment or derivative thereofto at least one of said remote sites, and store the calculated signaturein a signature storage, wherein said method further comprisingselectively switching between storage buffers according to switchingcriterion, and wherein for each selected storage buffer said signatureprocessing is performed in respect of blocks that are stored at theselected storage buffer.
 13. A method for processing data to generate acompressed data for transmission from a first site to at least oneremote site over communication medium, comprising: at the first site,processing successions of data portions and identify those portionswhich were changed; generating a compressed data that includes dataportions which were changed, and transmitting the compressed data overthe communication medium, wherein said data portions are stored at astorage having at least two storage buffers such that data portionsoriginating from the same origin are stored in the same storage buffersand wherein said method further comprising selecting one of said storagebuffers and said processing is performed in respect of data portionsthat are stored at the selected storage buffer.
 14. The processoraccording to claim 11, wherein said signature storage includes slowstorage and fast storage.
 15. The processor according to claim 14,wherein said fast storage includes cache memory.
 16. The processoraccording to claim 14, wherein said processor is configured to performsignature processing including: pre-fetch signatures from the slowstorage to the fast storage according to a given criterion, and whereinsaid processor is configured to determine whether calculated signaturematches corresponding signature, if any, stored in the fast signaturestorage or the slow signature storage.
 17. The processor according toclaim 16, wherein said criterion being to pre-fetch signatures offrequently used sub-segments.
 18. The processor according to claim 14,wherein sub-segments for transmission are compressed to therebyconstitute said derivatives of said sub-segments.
 19. The processoraccording to claim 14, wherein said fast storage further storing a listof commonly used sub-segments and an associated codes being eachconsiderably shorter than the respective sub-segment, and in the casethat a sub-segment that is to be transmitted belongs to said commonlyused sub-segments, transmitting the code which constitutes saidderivative of the sub-segment.
 20. The processor according to claim 11used for data replication.
 21. The processor according to claim 11 usedfor backup.
 22. The processor according to claim 11 used for datamigration.
 23. A method according to claim 12 wherein said signaturestorage includes a slow storage and a fast storage and wherein saidmethod further includes, for each selected storage buffer, pre-fetchingsignatures from the slow storage to the fast storage.
 24. A methodaccording to claim 12 wherein said signature storage includes a slowstorage and a fast storage and wherein said method further includespre-fetching signatures from the slow storage to the fast storageaccording to a given criterion.
 25. A method according to claim 24wherein said criterion being to pre-fetch signatures of frequently usedsub-segments.
 26. A method according to claim 12 wherein sub-segmentsfor transmission are compressed to thereby constitute said derivativesof said sub-segments.
 27. A method according to claim 23 wherein saidfast storage further storing a list of commonly used sub-segments andassociated codes being each considerably shorter than the respectivesub-segment, and in the case that a sub-segment that is to betransmitted belongs to said commonly used sub-segments, transmitting thecode which constitutes said derivative of the sub-segment.
 28. Themethod according to claim 12 used for data replication.
 29. The methodaccording to claim 12 used for backup.
 30. The method according to claim12 used for data migration.